Non-identifiable Pedigrees and a Bayesian Solution
نویسنده
چکیده
Some methods aim to correct or test for relationships or to reconstruct the pedigree, or family tree. We show that these methods cannot resolve ties for correct relationships due to identifiability of the pedigree likelihood which is the probability of inheriting the data under the pedigree model. This means that no likelihood-based method can produce a correct pedigree inference with high probability. This lack of reliability is critical both for health and forensics applications. Pedigree inference methods use a structured machine learning approach where the objective is to find the pedigree graph that maximizes the likelihood. Known pedigrees are useful for both association and linkage analysis which aim to find the regions of the genome that are associated with the presence and absence of a particular disease. This means that errors in pedigree prediction have dramatic effects on downstream analysis. In this paper we present the first discussion of multiple typed individuals in non-isomorphic pedigrees, P and Q, where the likelihoods are non-identifiable, Pr[G | P, θ] = Pr[G | Q, θ], for all input data G and all recombination rate parameters θ. While there were previously known non-identifiable pairs, we give an example having data for multiple individuals. Additionally, deeper understanding of the general discrete structures driving these non-identifiability examples has been provided, as well as results to guide algorithms that wish to examine only identifiable pedigrees. This paper introduces a general criteria for establishing whether a pair of pedigrees is non-identifiable and two easy-to-compute criteria guaranteeing identifiability. Finally, we suggest a method for dealing with non-identifiable likelihoods: use Bayes rule to obtain the posterior from the likelihood and prior. We propose a prior guaranteeing that the posterior distinguishes all pairs of pedigrees. Shortened version published as: B. Kirkpatrick. Non-identifiable pedigrees and a Bayesian solution. Int. Symp. on Bioinformatics Res. and Appl. (ISBRA), 7292:139-152 2012.
منابع مشابه
Sparse Linear Identifiable Multivariate Modeling
In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component δ-function and conti...
متن کاملMaximum likelihood pedigree reconstruction using integer programming
Abstract Pedigrees are ‘family trees’ relating groups of individuals which can usefully be seen as Bayesian networks. The problem of finding a maximum likelihood pedigree from genotypic data is encoded as an integer linear programming problem. Two methods of ensuring that pedigrees are acyclic are considered. Results on obtaining maximum likelihood pedigrees relating 20, 46 and 59 individuals a...
متن کاملMaximum likelihood haplotyping for general pedigrees.
Haplotype data is valuable in mapping disease-susceptibility genes in the study of Mendelian and complex diseases. We present algorithms for inferring a most likely haplotype configuration for general pedigrees, implemented in the newest version of the genetic linkage analysis system SUPERLINK. In SUPERLINK, genetic linkage analysis problems are represented internally using Bayesian networks. T...
متن کاملComparison of Estimates Using Record Statistics from Lomax Model: Bayesian and Non Bayesian Approaches
This paper address the problem of Bayesian estimation of the parameters, reliability and hazard function in the context of record statistics values from the two-parameter Lomax distribution. The ML and the Bayes estimates based on records are derived for the two unknown parameters and the survival time parameters, reliability and hazard functions. The Bayes estimates are obtained based on conju...
متن کاملOn the choice of parameterisation and priors for the Bayesian analyses of Mendelian randomisation studies.
Mendelian randomisation is a form of instrumental variable analysis that estimates the causal effect of an intermediate phenotype or exposure on an outcome or disease in the presence of unobserved confounding, using a genetic variant as the instrument. A Bayesian approach allows current knowledge to be incorporated into the analysis in the form of informative prior distributions, and the unobse...
متن کامل